Goto

Collaborating Authors

 Knowledge Engineering


K-DECORE: Facilitating Knowledge Transfer in Continual Structured Knowledge Reasoning via Knowledge Decoupling

Neural Information Processing Systems

Continual Structured Knowledge Reasoning (CSKR) focuses on training models to handle sequential tasks, where each task involves translating natural language questions into structured queries grounded in structured knowledge. Existing general continual learning approaches face significant challenges when applied to this task, including poor generalization to heterogeneous structured knowledge and inefficient reasoning due to parameter growth as tasks increase. To address these limitations, we propose a novel CSKR framework, K-DECORE, which operates with a fixed number of tunable parameters. Unlike prior methods, K-DECORE introduces a knowledge decoupling mechanism that disentangles the reasoning process into task-specific and task-agnostic stages, effectively bridging the gaps across diverse tasks. Building on this foundation, K-DECORE integrates a dualperspective memory consolidation mechanism for distinct stages and introduces a structure-guided pseudo-data synthesis strategy to further enhance the model's generalization capabilities. Extensive experiments on four benchmark datasets demonstrate the superiority of K-DECORE over existing continual learning methods across multiple metrics, leveraging various backbone large language models.


Turning the Tables: Enabling Backward Transfer via Causal-Aware LoRA in Continual Learning

Neural Information Processing Systems

Current parameter-efficient fine-tuning (PEFT) methods have shown superior performance in continual learning. However, most existing PEFT-based methods focus on mitigating catastrophic forgetting by limiting modifications to the old task model caused by new tasks. This hinders backward knowledge transfer, as when new tasks have a strong positive correlation with old tasks, appropriately training on new tasks can transfer beneficial knowledge to old tasks. Critically, achieving backward knowledge transfer faces two fundamental challenges: (1) some parameters may be ineffective on task performance, which constrains the task solution space and model capacity; (2) since old task data are inaccessible, modeling task correlation via shared data is infeasible. To address these challenges, we propose CaLoRA, a novel causal-aware low-rank adaptation framework that is the first PEFT-based continual learning work with backward knowledge transfer. Specifically, we first propose parameter-level counterfactual attribution (PaCA) that estimates the causal effect of LoRA parameters via counterfactual reasoning, identifying effective parameters from a causal view. Second, we propose cross-task gradient adaptation (CaGA) to quantify task correlation by gradient projection and evaluate task affinity based on gradient similarity. By incorporating causal effect, task correlation, and affinity, CaGA adaptively adjusts task gradients, facilitating backward knowledge transfer without relying on data replay. Extensive experiments across multiple benchmarks and continual learning settings show that CaLoRA outperforms stateof-the-art methods.


Merging on the Fly Without Retraining: ASequential Approach to Scalable Continual Model Merging

Neural Information Processing Systems

Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their specialized capabilities across different tasks and domains. Current model merging techniques focus on merging all available models simultaneously, with weight interpolation-based methods being the predominant approach. However, these conventional approaches are not well-suited for scenarios where models become available sequentially, and they often suffer from high memory requirements and potential interference between tasks. In this study, we propose a training-free projection-based continual merging method that processes models sequentially through orthogonal projections of weight matrices and adaptive scaling mechanisms. Our method operates by projecting new parameter updates onto subspaces orthogonal to existing merged parameter updates while using an adaptive scaling mechanism to maintain stable parameter distances, enabling efficient sequential integration of task-specific knowledge. Our approach maintains constant memory complexity to the number of models, minimizes interference between tasks through orthogonal projections, and retains the performance of previously merged models through adaptive task vector scaling. Extensive experiments on CLIP-ViT models demonstrate that our method achieves a 5-8% average accuracy improvement while maintaining robust performance in different task orderings. Code is publicly available at https://github.com/tanganke/opcm/.


UniHG: ALarge-scale Universal Heterogeneous Graph Dataset and Benchmark for Representation Learning and Cross-Domain Transferring

Neural Information Processing Systems

Irregular data in the real world are usually organized as heterogeneous graphs consisting of multiple types of nodes and edges. However, current heterogeneous graph research confronts three fundamental challenges: i) Benchmark Deficiency, ii) Semantic Disalignment, and iii) Propagation Degradation. In this paper, we construct a large-scale, universal, and joint multi-domain heterogeneous graph dataset named UniHG to facilitate heterogeneous graph representation learning and cross-domain knowledge mining. Overall, UniHG contains 77.31 million nodes and 564 million directed edges with thousands of labels and attributes, which is currently the largest universal heterogeneous graph dataset available to the best of our knowledge. To perform effective learning and provide comprehensively benchmarks on UniHG, two key measures are taken, including i) the semantic alignment strategy for multi-attribute entities, which projects the feature description of multi-attribute nodes and edges into a common embedding space to facilitate information aggregation; ii) proposing the novel Heterogeneous Graph Decoupling (HGD) framework with a specifically designed Anisotropy Feature Propagation (AFP) module for learning effective multi-hop anisotropic propagation kernels. These two strategies enable efficient information propagation among a tremendous number of multi-attribute entities and meanwhile mine multi-attribute association adaptively through the multi-hop aggregation in large-scale heterogeneous graphs. Comprehensive benchmark results demonstrate that our model significantly outperforms existing methods with an accuracy improvement of 28.93%. And the UniHG can facilitate downstream tasks, achieving an NDCG@20 improvement rate of 11.48% and 11.71%.


Order-Level Attention Similarity Across Language Models: A Latent Commonality

Neural Information Processing Systems

In this paper, we explore an important yet previously neglected question: Do context aggregation patterns across Language Models (LMs) share commonalities? While some works have investigated context aggregation or attention weights in LMs, they typically focus on individual models or attention heads, lacking a systematic analysis across multiple LMs to explore their commonalities. In contrast, we focus on the commonalities among LMs, which can deepen our understanding of LMs and even facilitate cross-model knowledge transfer. In this work, we introduce the Order-Level Attention (OLA) derived from the order-wise decomposition of Attention Rollout and reveal that the OLA at the same order across LMs exhibits significant similarities. Furthermore, we discover an implicit mapping between OLA and syntactic knowledge. Based on these two findings, we propose the Transferable OLA Adapter (TOA), a training-free cross-LM adapter transfer method. Specifically, we treat the OLA as a unified syntactic feature representation and train an adapter that takes OLA as input.


Towards Single-Source Domain Generalized Object Detection via Causal Visual Prompts

Neural Information Processing Systems

Single-source Domain Generalized Object Detection (SDGOD), as a cutting-edge research topic in computer vision, aims to enhance model generalization capability in unseen target domains through single-source domain training. Current mainstream approaches attempt to mitigate domain discrepancies via data augmentation techniques. However, due to domain shift and limited domain specific knowledge, models tend to fall into the pitfall of spurious correlations. This manifests as the model's over-reliance on simplistic classification features (e.g., color) rather than essential domain-invariant representations like object contours. To address this critical challenge, we propose the Cauvis (Causal Visual Prompts) method. First, we introduce a Cross-Attention Prompts module that mitigates bias from spurious features by integrating visual prompts with cross-attention. To address the inadequate domain knowledge coverage and spurious feature entanglement in visual prompts for single-domain generalization, we propose a dual-branch adapter that disentangles causal-spurious features while achieving domain adaptation via high-frequency feature extraction.


Disentangling and mitigating the impact of task similarity for continual learning

Neural Information Processing Systems

Continual learning of partially similar tasks poses a challenge for artificial neural networks, as task similarity presents both an opportunity for knowledge transfer and a risk of interference and catastrophic forgetting.However, it remains unclear how task similarity in input features and readout patterns influences knowledge transfer and forgetting, as well as how they interact with common algorithms for continual learning.Here, we develop a linear teacher-student model with latent structure and show analytically that high input feature similarity coupled with low readout similarity is catastrophic for both knowledge transfer and retention. Conversely, the opposite scenario is relatively benign. Our analysis further reveals that task-dependent activity gating improves knowledge retention at the expense of transfer, while task-dependent plasticity gating does not affect either retention or transfer performance at the over-parameterized limit. In contrast, weight regularization based on the Fisher information metric significantly improves retention, regardless of task similarity, without compromising transfer performance. Nevertheless, its diagonal approximation and regularization in the Euclidean space are much less robust against task similarity. We demonstrate consistent results in a permuted MNIST task with latent variables. Overall, this work provides insights into when continual learning is difficult and how to mitigate it.


A Practitioner's Guide to Continual Multimodal Pretraining

Neural Information Processing Systems

However, practical model deployment often operates in the gap between these two limit cases, as real-world applications demand adaptation to specific subdomains, tasks or concepts -- spread over the entire, varying life cycle of a model.


Agent Planning with World Knowledge Model

Neural Information Processing Systems

Imitating humans' mental world knowledge model which provides global prior knowledge before the task and maintains local dynamic knowledge during the task, in this paper, we introduce parametric W orld K nowledge M odel ( WKM) to facilitate agent